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Abstract 

As the Internet faces scalability, mobility, and security is- 
sues, new network architectures are being proposed to better 
accommodate the needs of modern applications. In particu- 
lar, Content-Oriented Networking (CON) has attracted con- 
siderable attention, from both academic and industrial com- 
munities, as an alternative future Internet architecture. CON 
sets to decouple content from hosts, at the network layer, by 
naming data rather than hosts. It comes with a potential for 
a wide range of benefits, including reduced congestion and 
improved delivery speed by means of content caching, sim- 
pler configuration of network devices, and security at the 
data level. However, it remains an interesting open question 
whether or not, and to what extent, this emerging network- 
ing paradigm bears new privacy challenges. In this paper, 
we present a systematic privacy analysis of CON and the 
common building blocks among its various architectural in- 
stances, in order to highlight emerging privacy threats and 
analyze a few potential countermeasures. Finally, we com- 
pare the feasibility and effectiveness of privacy-enhancing 
technologies in CON as opposed to today's Internet, and 
conclude by identifying a list of open research challenges. 

1 Introduction 

In the last few years, the growing penetration of the Web 
in modern society has produced a massive growth of data 
routed in the Internet. Global IP traffic - thrusted by the 
soaring proliferation of mobile and streaming data - is ex- 
pected to increase 3-fold over the next 5 years [24]. Besides 
exceeding its expectations, the Internet has also stretched 
many of the initial assumptions, creating issues that chal- 
lenge its underlying communication model. Users and ap- 
plications increasingly operate in terms of content, making 
it difficult to conform to IP's requirement to communicate 
by discovering and specifying hosts and locations [56]. 

Coping with vast amounts of traffic becomes extremely 
arduous, due to a number of issues deep-rooted in the net- 
work design. The quest for improving scalability and (cost) 
efficiency of content delivery has led to the design of over- 
lay networks, such as, Peer-To-Peer (P2P) and Content Dis- 



tribution Networks (CDNs). However, overlay networks 
often complicate network management and application de- 
velopment. P2P reduces servers' load by distributing con- 
tent among peers, using dynamic and fault-tolerant net- 
works, but it often results in increased inter-provider traf- 
fic [69, 52]. Moreover, P2P is application-dependent, thus, 
confined to a specific usage. CDNs are used to convoy 
user's requests to geographically closest caches in order to 
reduce traffic, however, they require ad-hoc infrastructures 
and only some providers can afford the related deployment 
costs. 

Frequent attacks against SSL [33, 42], as well as the 
hacking of certification authorities [46], accentuate the 
weakness of endpoint authentication mechanisms, as an 
endpoint can only authenticate the counterpart, but not the 
message. Furthermore, today's Internet often struggles to 
cope with mobility and resilience to disruption. Transport 
layer is, by design, unable to manage mobile parties and 
add-on features - e.g.. Mobile IPv6 (MIPv6) and Hierarchi- 
cal MIPv6 [18] - have been suggested, albeit suffering from 
handoff latency and packet losses [25]. 

Motivated by these issues, new architectures have been 
proposed, in the last few years, aiming to redesign the In- 
ternet,' and accommodate content-oriented applications. In 
particular, Content-Oriented Networking (CON) [21] sets 
to decouple contents from hosts at the network layer, by 
relying on the publish/subscribe paradigm [38], and shifts 
identification from host to content, so that this can be lo- 
cated anywhere in the network. 

1.1 Roadmap & Contributions 

The content-centric communication paradigm introduced 
by CON relies on naming data itself, rather than its location. 
Content is self-contained, has a unique name, and can be 
retrieved by means of an interest for that name. Also, it can 
be cached in any arbitrary location and be digitally signed 
to ensure its integrity and authenticity. 

CON comes with a potential for a wide range of benefits, 
including reduced congestion and improved delivery speed 



See, e.g., NSF's Future Internet Architecture multi-million program [57]. 
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by means of content caching, simpler configuration of net- 
work devices, and security at the data level. However, it 
remains an interesting open question whether or not, and to 
what extent, this emerging networking paradigm bears new 
privacy challenges. While privacy-friendly features, such 
as, lack of source/destination addresses, seem to help pri- 
vacy in CON, a closer look to some of the design choices 
unveils a number of open issues. This paper systematically 
studies privacy in CON as a generic paradigm, and shows 
that it introduces several worrisome issues. 

First, we analyze the implications of caching - one of the 
crucial features in CON used to reduce traffic and improve 
delivery speed - on user privacy. We show that, as nodes 
cache frequently requested content, any node can infer con- 
tent consumed by others using timing information. 

Next, we focus on content privacy: since publicly avail- 
able content is not encrypted in CON and, as routers 
now handle content, any router can easily inspect con- 
tent. Whereas, packet inspection in today's Internet requires 
routers to reassemble packets, which is very inefficient as 
router primary job is to forward packets, and often requires 
dedicated hardware [9]. 

Further, observe that, as content in CON is retrieved us- 
ing names that most likely are semantically related to the 
content itself, an attacker could infer sensitive information 
about a user, by monitoring her requests: we refer to this 
issue as name privacy. Finally, we look at the problem of 
signature privacy: since digital signatures of CON pack- 
ets need to be publicly verifiable, the identity of a content 
signer may be easily inferred by looking at the signature. 

Throughout the paper, we analyze different attack scenar- 
ios that threat privacy in CON. For each setting, we describe 
the attacker capabilities and the attack's impact on user pri- 
vacy. We suggest several countermeasures and detail their 
strengths and weaknesses, and make sure that they would 
bear minimal changes to the CON architecture. To the best 
of our knowledge, this work represents the first step towards 
assessing CON privacy issues. In the process, we also high- 
light a number of challenging open problems that should be 
addressed before large-scale deployment of CON instantia- 
tions. 



1.2 Paper Organization 

Next section presents an overview of Content-Oriented 
Networking (CON). Then, we provide a thorough analy- 
sis of a few privacy challenges in CON and detail possible 
countermeasures in Section 3. In Section 4, we compare the 
feasibility and effectiveness of privacy-enhancing technolo- 
gies in CON as opposed to today's Internet. Finally, after 
summarizing related work in Section 5, the paper concludes 
in Section 6. 



2 Content-Oriented Networking 
(CON) 

This section provides a high level description of Content- 
Oriented Networking (CON). We present the main com- 
ponents common to most of available CON architectures, 
along with their respective design choices. 

In the past few years, several future Internet architectures 
have been proposed that realize CON. The most promi- 
nent ones include DONA [47], Netinf [4], CCN [45], 
LANES [71], TRIAD [43], and CBCB [17]. 

We now review their macro building blocks: 

1 . Named content: In CON, objects are always named to 
facilitate data dissemination and search. Consequently, 
the security model is also shifted from host to content 
authentication. 

2. Content-based routing: Content routing in CON re- 
lies on content rather than hosts, aiming to handle in- 
creased amounts of network traffic and be more re- 
silient to network bursts and users' mobility. 

3. Content Delivery: Content is efficiently delivered 
using multi-path routing and leveraging in-network 
caching, in order to minimize network bandwidth and 
delivery delay, and transparently handle mobile users. 

4. In-network storage: All CON network components 
provide caching capability. Note that this is different 
from packet buffers in today's routers, as cache size is 
expected to be several orders of magnitude bigger in 
CON. 

Actors. CON involves several entities. End users ex- 
press interests and fetch data using a wide range of de- 
vices. Interests represent the willingness of the user to re- 
trieve certain data, independently of its location. Content 
routers are responsible of forwarding interests and forward- 
ing back the associated data. Each router is assumed to 
have a built-in cache. Cache size as well as caching al- 
gorithm may differ from one router to the other. Finally, 
content producers generate the content, which can be ei- 
ther static (time-independent) or dynamic (generated upon 
request). Although data-centers or/and geographically dis- 
tributed servers may be used to serve content, we simplify 
the model by considering a single source machine. 

In the following, we overview CON's architecture design. 
For further details, we refer the reader to [23, 2]. 

Caching. A key feature used to increase overall net- 
work efficiency in CON is caching. All nodes in the net- 
work are expected to participate in the caching effort, from 
core routers to mobile devices. Caches provide in-network 



2 



neigbborioater 



router 



Get content x 



•"■'^Is X in the 
y,^_^cache ? 

Get content x 



Verif signature^ 
Si cache 



G 



Get content x 



Verif signature^ 
& cache 



G 



DIs X in the 
cache ? 

Get content x 



content x 



DIs x in the 
cache ? 



Figure 1: An overview of the main CON features: content routing, 
caching and content signature 



Storage and are assumed to be several orders of magni- 
tude larger than today's buffers. This capacity allows to 
store content for longer periods and enhance network per- 
formance. There is a number of efforts aiming at optimal 
caching strategies in CON - see, e.g., [5, 64, 68, 67]. In par- 
ticular, [67] demonstrates that network topology has a lim- 
ited effect in caching efficiency, whereas, catalogue's size 
(i.e., size of all content) and content popularity play a ma- 
jor role in caching efficiency. [64] proposes a probabilistic 
caching algorithm, based on an approximation of the cache 
capacity, while [22] introduces a caching approach that re- 
lies on the content popularity to decide whether to cache the 
content or not. 

Content Naming. The main abstraction in CON archi- 
tecture is the Content Object (CO). All kinds of content, 
ranging from web pages to documents, and even interactive 
content, such as VoIP, are abstracted as a CO. An object 
is always identified by a name, which must be unique as 
it serves as identifier for searching and disseminating as- 
sociated content. Moreover, since content can be fetched 
from anywhere, there should be a secure binding between 
content name and content data (i.e., name-data integrity) 
as well as object authenticity. Finally, objects retrieved 
from cache should carry information about the object owner 
(publisher). Two naming approaches have been proposed so 
far - flat and hierarchical. 

Flat naming. Flat names are self-certifying, i.e., CO's 
name-data integrity is bounded to its name, thus, it can be 
verified without any PKI. In its simplest form, the name is 
expressed as the cryptographic hash function of the con- 
tent. Although one can assess the validity of the con- 
tent, it is impossible to verify its provenance and rele- 



vance. However, a few techniques have been proposed 
to enable provenance verification, by allowing the pub- 
lisher to have more control on the naming by adding a la- 
bel. For instance, the naming mechanism can be instan- 
tiated as Digest{PublicKeyp\\Labelc) [48], where || de- 
notes concatenation and PublicKeyp is the content pro- 
ducer public key, or Digest{PublicKeyp)\\Labelc, as it 
happens in DONA [47]. The content is authenticated by 
checking whether or not it is verifiable using PublicKeyp. 
Unfortunately, both solutions do not guarantee binding be- 
tween the name and the content". Moreover, as pointed 
out by [70], flat naming suffers from several other disad- 
vantages. First, opaque names are location-free, thus, mak- 
ing it difficult to build a routing mechanism to retrieve the 
nearest copy [16, 54]. Therefore, location-dependent ap- 
proaches are often used, such as Distributed Hash Table 
(DHT) [48, 65, 78]. Second, and most importantly, it is 
well-known that users strive to remember simple strings like 
emails and hostnames [1, 39, 3], thus, indirection architec- 
tures [1, 7] become necessary. This is somewhat similar 
to the Domain Name System (DNS) service (used today to 
map user-friendly names to network names), and, unfortu- 
nately, shifts security/privacy problems, once again, from 
the content name to the mapping architecture. 

Hierarchical naming. With hierarchical naming, name 
structure resembles that of today's URLs, where the '/' sym- 
bol delimits the name components. In some cases (e.g., 
CCN [45] and TRIAD [43]), the name is human-readable, 
which makes it easier for the user to access the content. 
The major benefit of adopting a hierarchical structure is 
to enable aggregation, thus, improving routing scalability. 
Further, [70] proposes an improvement to content naming 
centered on authenticating the link between name and con- 
tent, by making the content available as a triple A/( n.p,c) = 
{N, C, Signp{N, C)), where N is the content name, C is 
the content, and Sign p is producer's signature. This allows 
to authenticate the content, regardless of how or from whom 
it was obtained. Nonetheless, hierarchical naming has at 
least three drawbacks. First, names are not persistent: any 
changes in the hierarchy alter the content name and hence 
make the content unreachable. Second, since names are se- 
mantically meaningful, it might leak sensitive information 
about the content and hence may be considered as a privacy 
threat (which we further explore in Section 3.4). Finally, a 
Public-Key Infrastructure (PKI) is needed to verify name- 
data integrity, potentially impacting the scalability and the 
security of the architecture. 

Content Routing and Forwarding. Routing in CON is 
carried out in two phases: (1) routing of CO requests (called 

-As these schema are flawed, we suppose that all CON use secure binding 
method proposed by [70] where the content is made available as a triple 
M(jY,p,c) = (-'^i •S'*S"p(-'V, C')), where N is the content name, C 
is the content, and Signp is producer's signature. 
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Figure 2: Topology of cache attacks. 



interests), and (2) routing the content back to the user Nat- 
urally, routing in CON depends on the naming schema and, 
in particular, on whether or not name aggregation is possi- 
ble. For flat-naming based CON, a Name Resolution Ser- 
vice (NRS) is used to retrieve topological information (such 
as, location of the data) based on object name. Structured 
routing algorithms are often used to exploit structured net- 
work topologies, such as trees or DHT-s. For instance, 
DONA [47] maintains a tree topology and lets each router 
store routing information (i.e., published content) of all his 
descendants. Thus, any content (re)publication, deletion, or 
modification is propagated up to the root. With hierarchical 
naming, efficient routing and discovery is possible without 
any external service. Request/data aggregation may also fa- 
cilitate network scalability. This approach may resemble 
unstructured routing in IP, where IP addresses are replaced 
with content name and route advertising is achieved through 
flooding. 

3 Privacy Challenges in CON 

In this section, we present a systematic analysis of pri- 
vacy in Content-Oriented Networking (CON) by identifying 
threats and, when possible, discussing possible solutions. 
Multiple proposals [47, 4, 45, 71, 43, 17] have been pre- 
sented in the last couple of years to instantiate CON, with 
relatively minor differences in their proposed design. To 
ease presentation, we discuss technical details while consid- 
ering one specific instance, namely CCNx [60], the open- 
source project that implements Content-Centric Network- 
ing (CCN) [45]. CCNx is considered one of the most mature 
examples of CON in the research community. Nonetheless, 
we stress that threats and proposed countermeasures dis- 
cussed in this paper apply to CON in general. That is, issues 
are about fundamental features in CON (such as, caching, 
naming, data delivery and provenance assurance) and not 
about the specifics of one implementation. 

3.1 Cache privacy 

We start our analysis with cache privacy. Recall that 
caching is a fundamental component of CON as it benefits 
both latency and bandwidth consumption. However, it also 



introduces a fundamental challenge to user privacy. An ad- 
versary may use a router's cache to infer content exchanged 
(consumed) by users in the downstream and possibly link 
it to a specific user depending on its relative location to the 
user in the topology. Cache attacks in CON are exemplified 
in Figure 2. There are different types of attacks that can 
occur on cache privacy - we review them below. 

Timing attacks. By using time measurements [36], an 
adversary Adv can determine if a content has been cached at 
a particular router by measuring the delay to retrieve it. To 
do so, Adv measures the RTTs to retrieve any content from 
the source, the delay RTT^. to get cached content from the 
closest router, and the delay RTTf to fetch targeted content. 
Then, Adv compares the RTT as follow: 

• If \RTTt - RTTc\ < e (for neghgible e): Adv con- 
cludes that target content has been cached at the clos- 
est router (i.e., has been fetched by a neighboring con- 
sumer connected to the same router). 

• If RTTt > RTTc and RTTt < RTTs'. Adv knows 
that target content has been fetched from the source re- 
cently and cached in the network, but not by one of its 
immediate neighbors. Based on the difference between 
RTTt and RTTc, Adv can still predict how close the 
consumer of that content was to his location in the net- 
work topology. 

• Otherwise i\RTTt - RTTs\ < e), Adv concludes that 
target content has not been consumed recently. 

Such an attack allows Adv to check whether or not con- 
tent has been recently fetched, but not when. As mentioned 
above, since in-network caches are shared by all CON de- 
signs, this attack is inherent to all CON proposals. 

Protocol attacks. Without a careful design, content re- 
trieval protocols and their features in CON architectures can 
make access to cache content even easier. After investigat- 
ing such issues in CCNx [60], we found out that a num- 
ber of features and options in interest packets [59], and how 
they are matched with content packets, are particularly wor- 
risome: 

• Prefix-based matching: CCNx considers a content 
with name X to satisfy an interest for name F if F is a 
proper prefix of X. This can facilitate easy extraction 
of cache content without knowing exact names. Due 
to multiple types of content potentially satisfying an 
interest, an exclusion option is also conveniently pro- 
vided in CCNx interest packet format to allow exclu- 
sion of previously acquired content from subsequent 
queries. 

• Scoping: In CCNx, scope for interest packets is used to 
determine the maximum number of hops it will travel. 
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Such a feature makes it easy to query the caches of 
particular routers as it controls where (i.e., how many 
hops away) an interest packet can travel to. For in- 
stance, setting the scope to 2 would restrict an interest 
to propagate to only neighboring router(s) and allow 
convenient querying of their caches without relying on 
any timing information. 

As a result of these features, an adversary Adv can mon- 
itor the access to sensitive content within a certain scope 
or easily dump nearby caches' content. The former attack 
is achieved by periodically issuing an interest for target 
content m and setting the scope accordingly (e.g., setting 
scope to 2 to monitor the caches of immediate neighbors). 
If Im. times out, Adv concludes that m has not been fetched 
yet. Whenever a consumer within the scope accesses m, 
is satisfied, thus, allowing Adv to be notified. Hence, Adv 
is able to infer when the file was fetched for the first time. 
Dumping attacks can be achieved by sending an interest for 
the root prefix / or short prefixes, repeatedly, and exclud- 
ing what has been already received on successive interests. 
Combined with scoping, this method can easily be used to 
dump cache contents from nearby caches. 

We believe that the above attacks are quite worrying, 
however, observe that attacks work if the adversary has 
knowledge of the relative location of the adversary to the 
victim in the network topology. We distinguish two classes 
of adversaries based on their relative location to a victim: 

• Immediate Neighbor: if an attacker is sharing the first 
hop CON router with his potential victim, the privacy 
risk is maximized as it would not only be easy to singu- 
larly monitor or dump a close-by router, but also vic- 
tim's anonymity set would be very small, due to the 
limited number of users sharing that router 

• Remote Neighbor: Considering the tree-like topology 
in content distribution from its original source to its 
consumers (where the source is the root, consumers are 
the leaves, and the intermediate routers are nodes in be- 
tween), the path from an adversary and a consumer to 
the root will intersect at least one node. Therefore, the 
privacy risk decreases as the number of leaves in the 
subtree rooted at that node increases (i.e., anonymity 
set gets larger). 

Potential solutions 

Different algorithms have been proposed to enhance the 
hit ratio [5, 64, 68, 67] on caches, however, none of them 
takes into consideration potential related privacy issues. In 
the following, we discuss some potential countermeasures 
to mitigate such privacy threats at a high level. Although 
the detailed design and security analysis for these methods 
are not within the scope of this paper, we expect the research 
community to further investigate them in future work. 




Figure 3: Collaborative caching 

Wait before reply. A simple solution to the cache privacy 
problem is to delay all requests: when the router fetches 
content m, it should store the corresponding RTT t„i- Then, 
whenever a user requests m, the router waits before 
sending the data back. This technique has three main ad- 
vantages: first, it achieves perfect privacy since Adv cannot 
distinguish between cached and not cached data. Second, it 
does not make any assumptions about content correlation, 
network topology, or consumers. Finally, it still achieves 
reduced bandwidth thanks to caching. On the other hand, 
however, this approach has the main drawback of eliminat- 
ing the positive effect of caching on content retrieval delay. 

Delay the first k. An alternative to the above solution 
is to delay the first k requests for content m to ensure that 
only popular content is cached on edge routers serving small 
number of customers. Note that k should be chosen ran- 
domly by routers, otherwise an adversary could break this 
schema by issuing k requests and timing responses. The 
main advantage of this approach is that consumers access- 
ing popular content are unlikely to experience any delays 
introduced by routers. However, k should be carefully and 
randomly chosen for each content. High values of k re- 
sult in delaying most of the requests, whereas, a small value 
will have a negative impact on user's privacy by reducing 
the anonymity set. Furthermore, the delay on the retrieval 
of not so popular content will still be high. 

Collaborative caching. Multiple nearby caches could col- 
laborate to create a distributed cache that is serving a big- 
ger set of users. Such an integration would create an il- 
lusion of a single cache of bigger size and would also in- 
crease the anonymity set for customers. Several algorithms 
have been proposed in this context [49, 73] and can be cat- 
egorized as hierarchical or mesh based. The latter refers 
to a flat structure while the former is used for caches that 
have a tree-like structure. In a simple mesh scenario, two 
routers Rl and R2 collaborate as follows: the hash universe 
(e.g., 256 bits) is divided in two subspaces sl,s2 where 
Rl and R2 stores the elements in si and s2 respectively. 
Based on computed hash (of the routable prefix of the con- 
tent name), the router decides whether to cache that con- 
tent or transmit it to his neighboring router. Similarly, inter- 
ests are first forwarded to the router responsible for caching 
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the corresponding subspace (see Figure 3). Collaborative 
caching has two main advantages: first, users are likely to 
fall into larger anonymity sets even if the requested con- 
tent was found in cache. Second, hit rates for caches will 
increase as the collaboration would remove redundancy be- 
tween nearby caches and effectively simulate a cache that is 
much bigger in capacity. Due to this second property, there 
may be some economic incentive to deploy this solution be- 
sides protecting user privacy as well [72]. 

Probabilistic caching. Introducing randomness in the 
caching procedure may impact the accuracy of attacks. 
One possible approach could be probabilistic caching [64], 
where a router decides to cache content based on his posi- 
tion on the forwarding path as well as the available space in 
the cache. Since this decision is based on internal states of 
routers, it would not be known to an adversary. However, 
not caching a random subset of content can provide only a 
very limited privacy protection as the cached subset would 
still lead to violating user privacy. 

3.2 Content privacy 

Unencrypted communication over IP networks can be 
spied upon using Deep Packet Inspection (DPI) [9] by an 
adversary on the end-to-end communication path. How- 
ever, in CON, content privacy becomes an even more seri- 
ous threat due to the presence of persistent memory (caches) 
within the network. 

Monitoring and Censorship: DPI tools are already com- 
monly used by certain governments or Internet Service 
Provider (ISP) for classifying and censoring content (e.g., 
based on keywords). However, DPI on IP networks re- 
quires powerful adversaries that are strategically located on 
the main communication path and with enough computa- 
tion power to perform DPI on line speed. As CON stores 
data packets for long time and makes it available to anyone 
that asks for it, neither of these assumptions about the ad- 
versary holds. In fact, the adversary might retrieve content 
from caches for DPI based monitoring, classification and 
censorship. 

Potential solutions 

As the problem is caused by the lack of data confidential- 
ity, encryption would be the de-facto solution. Naturally, 
the encryption mechanism should provide the best balance 
between security and efficiency. Also, it should be preserv- 
ing the benefits of caching mechanism. 

Symmetric/Asymmetric encryption. A trivial approach 
might be to use similar mechanisms to SSL/TLS, where a 
client generates a session key and encrypts it using the pro- 
ducer public key. After receiving this key, the producer use 
it to encrypt the content and send it back to the user The 
main consequence of such approach is disabling caching 
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mechanism as only one user can decrypt the content. 

Broadcast encryption [37, 11] allows a "broadcaster" to 
send an encrypted message to a set of receivers n, each 
of which has a different private key. Given any subset of 
n, the broadcaster can construct an encrypted message so 
that only the receivers in the subset can decrypt it. Using 
broadcast encryption to guarantee confidentiality in CON 
presents several advantages. First, the publisher of a con- 
tent can encrypt it only once, for a known subset of users. 
Also, the publisher can precompute and store, or generate 
new decryption keys on the fly, for already published con- 
tent. Further, since encrypted content can be consumed by 
many users, the benefit from caching can be preserved in the 
network. However, the publisher should generate and store 
as many keys as the number of clients (n). Also, producer's 
public key and ciphertexts would be of size 0{y/n) [11], 
which may result into a significant communication over- 
head. 

Proxy re-encryption [10] allows a third-party (called 
proxy) to (re)encrypt a ciphertext which has been encrypted 
for Alice, so that it can be decrypted by another user, 
e.g.. Bob. The proxy is considered "semi-trusted" be- 
cause it does not see the content of the messages being 
translated. In the CON scenario (see Figure 4), the con- 
tent provider could generate a pair of public/private key 
(PKp/SKp) for each content object. The content, m, is 
then encrypted as anig — ENC{M, SKp). Whenever a 
client C (with public/private key pair PKqI SKc) retrieves 
the content m^, it queries the content publisher to generate 
a re-encryption key by sending PKc- C then receives a 
transformation key PKpc from the publisher that allows 
him to re-encrypt content nis so that he can decrypt it (i.e., 
DEC{RE-ENC{m„PKpc), SKc) is the original con- 
tent m). This allows the message to be encrypted only 
once and lets the producer retain control over the decryp- 
tion since he can refuse the delivery of the transformation 
key. Also, key management is simplified as the producer 
creates transformation keys on the fly and does not have 
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to store any additional keys besides his PKplSKp. The 
encrypted content is disseminated as TOs to all users and 
allows them to benefit from nearby caches. However, it re- 
quires both asymmetric encryption and re-encryption (key 
transformation), which are computationally more expensive 
than commonly used symmetric-key encryption algorithms. 
Nonetheless, as the data is encrypted only once, this over- 
head can be acceptable in many cases. 

Cover files. Arianfar et al. [6] described an algorithm 
to mix legitimate content with so-called "cover files". The 
content publisher selects a cover content to mix with legiti- 
mate content. Cover files are known to both user and the 
adversary Adv. All files are cut in equally-sized blocks 
and padding is used when necessary. For all k tuples, 
composed of cover and legitimate blocks, the publisher 
computes the exclusive-or and publishes the result. For 
instance, if k=2 and given the blocks ci, C2, and I2 
(c for cover and / for legitimate), the publisher computes 
Ci©C2, CiQ^i, ci®Z2, C2®?i, C2©Z2 and and publishes 
them. Using a secure side channel, the user retrieves meta 
information, such as the content hash and length in blocks, 
cover blocks and the algorithm for generating the names of 
each block. To retrieve the content, the user requests chunks 
per name and uses belief propagation or Gaussian elimina- 
tion to reconstruct the original file. However, this technique 
requires the cooperation of producers who have to generate 
large amounts of data and xor them. In fact, as pointed out 
in [6], the publisher must produce all the chunks in advance 
and thus, must perform 0{{a + operations where a 
represents the number of legitimate blocks, (5 the number of 
cover blocks, and k the possible number of permutations. 
Hence, this method incurs several drawbacks that make it 
unpractical for real-world use. 

3.3 Name privacy 

Name privacy arises from the semantic correlation be- 
tween human-readable content name and the content it- 
self. Unlike IP, where addresses represent hosts in the net- 
work and are not semantically correlated with the content, 
CCNx [60] names the content itself and routes data based 
on content names. Unfortunately, such a property creates an 
imminent privacy threat as the content names are not only 
visible but also expected to be semantically related to the 
content itself (e.g., /USAVebMD/AIDS/Symptoms/html). 
Although this initially seems to be similar to an HTTP con- 
nection over IP, it is actually more fundamental in CCNx, 
as content names cannot be encrypted like the URLs in 
HTTPS connections. Unlike CCNx, some other architec- 
tural proposals such as [47, 4] use flat names that are not 
human-readable. However, all of these proposals rely on a 
Name Resolution Service (NRS) that performs the transla- 
tion from human-readable names. Since NRS information 
is public and accessible to an adversary, even architectures 



with non human-readable names create the same inherent 
threat, due to the naming of content pieces. 

Potential solutions 

Bloom filters. The main name privacy challenge in CONs 
is keeping the name private while ensuring accessibility and 
routability. A possible solution is to use Bloom filters [14] 
to identify content. The resulting architecture would be 
composed of three main blocks: 

1. A hierarchical bloom filter used as the routing table. 

2. A counting bloom filter for each interface used as a 
PIT table [77]. 

3. A hierarchical bloom filter used as the router storage. 
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Figure 5: Routing Table using hierarchical bloom filter 



Figure 5 shows an example architecture for a rout- 
ing table. Rather than sending the request in the clear 
for a hierarchically named content, a client would com- 
pute the corresponding hierarchical bloom filter as HB = 
{Bi, B2, Bn ) where Bi is the bloom filter of name com- 
ponents up to the i-th component. For instance, when ask- 
ing for /NYtimes/article/green-econmy, the cUent computes 
a bloom filter Bi of /NYtimes/, B2 of /NYtimes/article and 
Bj, of /NYtimes/article/green-econmy. This scheme would 
work on CCNx as follows: a router checks whether the last 
filter {Bn since it contains the exact content name) is in 
its content store, if so, the content is returned to the cus- 
tomer. If not, the router verifies whether Bn exists in any 
PIT table. If a match is found, the counting Bloom filter of 
the corresponding PIT is updated (add one) and the inter- 
est is dropped since a request has already been forwarded. 
Finally, if the interest is not available in the content store 
nor in any PIT table, the routing process starts: the router 
checks for longest prefix match on its routing table by start- 
ing from Bn going all the way to Bi until it finds a matching 
routing entry. This approach would enjoy from the obfus- 
cation of content name resulting from transforming it into 
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a random looking string of bits. Also, it can reduce the 
size needed for storing PIT tables [77], content storage, and 
routing table, depending on the parameters for the filters 
and the size of the content name domain. However, Bloom 
filters introduce false positives and periodically require re- 
setting. 

3.4 Signature privacy 

One of main goals of CONs is to decouple content from 
its location and allow retrieving from caches nearby. In or- 
der to trust fetched data, some CON architectures, such as, 
CCNx [60], use digital signatures to provide guarantees on 
provenance and integrity. Although signatures are power- 
ful tools that bind content to its producer, ordinary digital 
signatures may leak sensitive identity information about the 
signer.^ This is problematic, especially when considering 
censorship and monitoring, as content from certain publish- 
ers can be easily and conveniently identified from the sign- 
ing key, which is explicitly stated at every data packet in 
CCNx. 

Potential solutions 

Confirmer signatures. A first approach to prevent an 
adversary from verifying a signature would be to use con- 
firmer signatures. Confirmer signatures are undeniable sig- 
natures [19] where the signer delegates the verification to a 
third party (the confirmer), thus, signatures cannot be veri- 
fied without interacting with this party. Using this method, 
multiple producers may delegate the verification to a third 
party and increase the anonymity set for publishers. Al- 
though this method would easy to implement [19, 40] and 
would not require any modification to the CCNx schema, it 
requires a third party as a confirmer and introduces an an- 
other round of communication for signature verification. 

Group signatures [20] allow the signer to hide in a set 
of potential signers, thus, providing signer-ambiguity. As 
such, the client can verify that the signature was generated 
by a member of the group but is unable to tell whom. For 
instance, a company X that has multiple employees may 
use a group signature so that a signature cannot be pub- 
Ucly tracked back to a single user but only to the company. 
Group signature is efficient since the size of the signature 
does not depend on the size of the group. However, it as- 
sumes the presence of a trusted group manager admitting 
group members, distributing keys, and revoking anonymity 
within the group, thus, making it appropriate only in limited 
settings with collaborating users. 

Ring signatures [66]. As cooperation between users is not 
always achievable, ring signatures [66] simplify group sig- 
natures by removing both group manager and members in- 

^^CCNx currently offers two choices, RSA and ECDSA, as signature algo- 
rithms. 



teraction. Therefore, there is no need to prearrange group of 
users, nor for special procedures for group management and 
key distribution. Also, the anonymity of the actual signer is 
always protected. For instance, a company X could collect 
the public keys of n other trusted companies Yi, I2, Yn, 
as these keys are publicly available. Then, X can generate 
a signature a for content m that keeps the signer unidentifi- 
able among X and other trusted companies. When a client 
fetches a content m, he is able to verify that the content was 
produced by one of {X} [J{Fi, 1^21 •••7 ^n} without know- 
ing which one it is. This schema allows customers to trust 
content as long as all possible signers are trustworthy, while 
making anyone observing their traffic unable to singularly 
discern the signer. However, the communication overhead 
introduced by ring signatures is linear in the size of the ring. 
Also, it would still be possible to enforce censorship based 
on signatures by blocking all content that list certain entities 
as one of potential signers. 

Ephemeral identities. Any content producer can create 
ephemeral keys to sign content. This would effectively pre- 
vent identifying the publisher of content by looking at its 
signature. However, this would also prevent customers from 
verifying the source/publisher of a content without an addi- 
tional mechanism to authenticate it. Luckily, CCNx allows 
creating unforgeable links by including the hash of a content 
object in the link and allows these links to be signed. This 
would allow publishing content signed with ephemeral keys 
that are not traceable to a long-lived identity, but would still 
allow users to establish transitive trust when they fetched it, 
by following link that is published by a trusted party. For in- 
stance, a link to a sensitive content might be published and 
signed by a trusted blog {nis) but the actual content (nia) 
might be published and signed with a one-time indentity 
and be served through an anonymous hosting service (e.g., 
rapidshare.com). Any user trusting the blog author can trust 
nis, and thus (nia), but an eavesdropper observing nia is 
unable to link it to its publisher. This approach is very easy 
to deploy and does not need any modification of the current 
architecture, however, it cannot hide the access to the link 
and prevent leakage through its signature. 

4 The Potential of CON's Privacy 

As the amount and the sensitivity of personal informa- 
tion disseminated on the Web increase, so do related pri- 
vacy concerns. According to the Pew Internet & American 
Life Project, Internet privacy is a growing concern among 
Americans [62], especially in the mobile environment [63]. 

In this section, we explore how CON relates to these is- 
sues and draw a comparison between CON and today's In- 
ternet. We investigate whether the large-scale deployment 
of CON architectures would be beneficial or detrimental, 
privacy-wise, under a few different perspectives, and, in the 



8 



process, we identify a number of open challenges. 

4.1 Anonymity 

Anonymity on the Internet describes the state of being 
not identifiable within a set of subjects - i.e., actions users 
carry out on the Web cannot be connected with their offline 
identities. Online anonymity is motivated by a number of 
factors, as users are often concerned about harassment, or 
even threats to their lives, resulting from online activities, 
such as, protesting and whistle-blowing [34]. 

In today's Internet, there exists a few techniques for 
anonymous communications. A straightforward technique 
is to rely on a trusted anonymizing proxy, which relays 
traffic while removing identifying information (e.g., IP ad- 
dresses and cookies) - see, for instance, the Anonymizer 
[13] and the Lucent Personalized Web Assistant [8]. While 
an external entity (i.e., a proxy server) is needed today to re- 
move and potentially substitute the source's identifier, CON 
architectures would actually offer such a service natively, 
by removing both source and destination addresses. In a 
sense, one could see the neighboring router as an anonymiz- 
ing proxy. In reality, however, a local passive adversary 
could monitor all proxy connectivity and identify users (see 
Section 3.1). 

Proxy-based solutions for anonymity, both in current and 
CON architecture, are generally subject to relying on a sin- 
gle point of failure and trust. Therefore, a few proxy-less 
anonymity solutions have been proposed in the last sev- 
eral years. They are usually classified based on the ap- 
plication constraints: delay-tolerant (e.g., email and file- 
sharing) or low-latency (e.g., web browsing). While both 
approaches rely on mix networks, low-latency anonymizing 
networks cannot afford traffic delaying and reordering as 
well as the introduction of decoy traffic due to latency ex- 
igence. Tor [32] is the best-known, and most widely used, 
low-latency anonymizing tool. Using onion routing and lay- 
ered encryption, it builds a multi-hop circuit, composed of 
at least three random nodes chosen from a central directory, 
from the user to the Web destination. 

A natural question, in this context, is whether or not 
onion routing techniques can be used in CON, and a first 
answer has been recently provided by ANDaNA [30], a Tor- 
like low-latency anonymizing tool for CCN [45]. Compared 
to Tor, it only requires two hops, thus, it is reportedly 2.3 to 
7 times faster than Tor when downloading small to medium 
size files. Moreover, since data in CON is signed, attacks 
where the adversary hijacks and modifies server's answers 
to de-anonymize user (see, e.g., [53]) would not be feasible. 

The portability of ANDaNA to other architectures other 
than CCN, however, depends on the routing protocol. For 
instance, PSIRP [71] routers do not store any routing infor- 
mation and rely on a Forwarding Identifier (FI)"* provided 

''FI is a Bloom filter-based technique used by routers to select the forward- 



by the client to route back the content, thus, guaranteeing 
anonymity is very challenging since the routing protocol it- 
self leaks information. As the FI canies information about 
the user, it should be anonymized while ensuring that data is 
correctly forwarded. Whereas, in Dona [47] and Netinf [4], 
the routing protocol is similar to that used in CCN [45], 
hence, it is safe to assume that ANDaNA [30] can be used 
on top of them to provide anonymous communications. 
However, due to data encryption used in ANDaNA, CON 
caching mechanism cannot be fully used. Also, ANDaNA 
inherits some of Tor weaknesses, e.g., the difficulty of cir- 
cumventing censorship while retrieving the directory with 
all participating nodes. In fact, while today Tor Bridges 
are used as an alternative way to reach the network with al- 
ternating results [74, 31], it is not clear how to do this in 
ANDaNA/CON. 

Nonetheless, ANDaNA actually seems more resilient to 
website fingerprinting [61], where an attacker can leverage 
patterns like time, quantity, and direction of traffic to clas- 
sify traffic despite encryption and/or tunneling. By contrast, 
ANDaNA seems to be more resilient, at least, about tim- 
ing, thanks to caching. Also, as data packets are bigger in 
CON, both the number and the size of requests carry less 
information than in IP. Furthermore, CON is not affected 
by response hijacking attacks [53], where the adversary can 
leverage the lack of data authenticity to modify server's an- 
swers (for instance, [53] shows that, by hijacking answers 
of BitTorrent tracker, an attacker controlling a Tor exit node 
can successfully de-anonymize users). This is not possible 
in CON since content is signed (thus, it cannot be modified). 

4.2 Censorship Resistance 

The increasingly important role played by the Web has 
led many governments to grow their attention in monitor- 
ing and censoring Internet traffic. Terms like "Internet cen- 
sorship" are used to denote control and suppression of the 
access to, or the publishing of, information on the Internet. 

A number of techniques are typically employed to fil- 
ter and/or block access to certain contents, including DNS 
tampering, IP blocking, and keyword filtering, as well as 
monitoring the usage of specific protocols. DNS tampering 
consists in "deregistering" the targeted domain, thus, mak- 
ing the domain unreachable. Whereas, IP blocking relies 
on predefined blacklists including IPs that are banned, and 
keyword filtering leverages Deep Packet Inspection (DPI) 
to dig into traffic and drop packets containing sensitive key- 
words. In today's Internet, both DNS tampering and IP 
blocking can be easily circumvented. The former can be 
bypassed by using public DNS server (e.g., OpenDNS and 
DNSCrypt [58]) as the main resolver rather than using the 
default ISP's DNS. The latter can also be circumvented us- 
ing a proxy. DPI-based filtering is harder to counter and 

ing interface 
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is used increasingly often - see, e.g., the deployment of 
China's Great Firewall (GFC) [27, 74]. 

Unfortunately, Internet censorship appears to be even 
easier in CON architectures. First, and foremost, naming 
content facilitates keyword filtering. Then, as CON routers 
have bigger computational and memory resources, content 
blocking could be earned more effectively, without the need 
for expensive dedicated hardware. Finally, data-monitoring 
is feasible since both interests and data are not encrypted. 
Therefore, an attacker only needs to modify the routing pro- 
tocol so that any "unwanted" interest is dropped. Content 
can be censored independently of its provider and selec- 
tively, following a fine-grained censorship approach. 

Also, note that DPI techniques can be stateful (i.e., keep- 
ing track of the network connections) and, as such, they 
require a significant amount of memory and processing 
resource. This requirement impacts filtering capabilities 
for "busy" Internet traffic. For instance, the work in [26] 
shows that censored Internet traffic in China has diurnal pat- 
terns: filtering becomes less effective and lets more than one 
fourth of the offending packets through during the busy In- 
ternet traffic periods. By contrast, CON, which adopts the 
publish/subscribe paradigm, makes the decision to censor 
content based either on the interest request or on received 
data, without requiring any external information. 

However, if CON traffic is exchanged using 
ANDaNA [30], there would be a couple of features 
making it harder for the attacker to censor content. First, 
both interests and content are encrypted, and, second, as the 
exit node is usually out of the attacker's control, the latter 
cannot delete content. Nonetheless, the effectiveness of 
ANDaNA (or other techniques in CON) to counter Internet 
censorship remain an open question that calls for further 
research. 

4.3 Untraceability 

Internet users today are often tracked and profiled. Large 
amounts of data, corresponding to several different events, 
are being collected and mined, so that this knowledge can 
be used to provide personalized services and behavioral ad- 
vertising. At the same time, privacy advocates have labeled 
this practice as privacy-invasive, since it allows companies 
and providers to build detailed user profiles including per- 
sonal and sensitive attributes, such as, sexual orientation, 
medical conditions, etc.^ 

Today, a number of techniques are employed to track 
users, the most widespread of which are cookies. Naturally, 
as cookies are used at the application level, it seems that 
they can be utilized (to track users across multiple websites) 
in CON too. However, cookie implementation in CON 
is somewhat cumbersome. While in current architectures 

^See, for instance, a recent incident where targeted advertisement led to 
the disclosure of a pregnancy otherwise kept hidden [44]. 



browsers automatically send cookies to Web servers when 
fetching data, following so-called same-origin policy, it is 
not clear how cookies will be implemented for static content 
in CON, since data can be fetched from anywhere. Cook- 
ies could be transmitted to the source only when fetching 
dynamic data and, as such, cookie-based tracking mecha- 
nisms in CON will be less aggressive as only dynamic con- 
tent can be tracked. Similar arguments apply for Javascript- 
based tracking, Supercookies, and Evercookies. Observe 
that protective measures against profiling are widely avail- 
able and can be used both with today's Internet and, in the 
future, in CON. For instance, a few browser add-ons, such 
as Ghostery [35] and DoNotTrack [55] could prevent the 
browser from sending cookies. 

However, more aggressive tracking techniques have re- 
cently emerged. For instance. Stateless tracking uses both 
user IP address and browser fingerprint to uniquely track 
users on the web [76]. While mitigating browser finger- 
printing can be achieved by using plug-ins, e.g., NoScripts, 
it is still hard to hide user's IP address. However, as CON ar- 
chitectures remove, by design, both parties' identifiers (i.e., 
source and destination addresses), IP-based tracking would 
become impossible. 

Arguably, many privacy-enhancing tools used in today's 
Internet can be easily ported to CON. Nevertheless, as dis- 
cussed in Section 3, specific CON network components 
can be used to better track users. Specifically, neighbor- 
ing routers have a significant role in protecting user privacy: 
if compromised or misused, they can severely endanger it. 
For instance, a router might collect content, content names, 
or content signatures to track users' navigation and build 
accurate profiles. 

While the lack of traceability might improve user privacy, 
it naturally raises both security and economical questions. 
Security concerns are related to the lack of source address. 
During a security incident (such as, DoS attacks), this in- 
formation is often crucial to identify the attacker. Also, 
removing source addresses may thwart security solutions 
used in firewalls and IDS/IPS tools: e.g., a simple method 
used today to avoid brute-forcing password guessing attack 
is to blacklist a host (based on its IP) after a few wrong 
attempts. Furthermore, economical challenges stem from 
radical changes imposed by CON on Internet advertise- 
ment. Ads are usually delivered based on website popular- 
ity and user location. Estimating website popularity based 
on the number of visits is ineffective in CON as caching will 
"hide" a significant amount of traffic. Furthermore, statis- 
tics, such as the average time spent on a particular website, 
or the number of visits per day, are very important for web- 
site administrators and Ad network operators. 

In conclusion, CON might enhance untraceabiUty thanks 
to caching and the lack of addresses, however, among oth- 
ers, resulting challenges to the economical and security 
standards are left to be studied. 
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4.4 Data authenticity and confidentiality 

One of the main CON's "selling points" is that secu- 
rity is built into the data itself, by enforcing content signa- 
ture. Therefore, integrity, provenance, and trustworthiness 
of content become built-in features. As keys can be treated 
as named CON data, key distribution does not constitute a 
major issue. While today's Internet requires a one-size-fits- 
aU trust model, trust in CON is end-to-end, between data 
producer and data consumer, and does not depend on any 
physical or temporal frame. 

This modularity has two main advantages. (1) Different 
consumers may easily implement different levels of security 
(e.g., blacklisting content that is not signed with a trusted 
key). (2) CON can deploy both widely accepted and new 
trust management models as data is independent from the 
deployed model. 

However, these features also prompt some challenges. 
First, we need to identify a set of usable trust mechanisms 
that can be deployed and used by most users. Second, as 
aU content is signed, it is crucial to assess (and potentially 
improve) efficiency of signature generation, transmission, 
verification, and possibly storage. 

Note that encryption in CON is not applied to pub- 
licly available content, thus, creating problems related to 
data confidentiaUty. One main disadvantage of current ap- 
proaches to data encryption (e.g., TLS) is that it inhibits 
caching, thus, defeating one of the major advantages to im- 
prove network performance. TLS seems to be in contradic- 
tion with the CON design as: (1) trust is linked to a ses- 
sion and not to the content itself, and (2) only one user can 
decrypt content, thus, inhibiting caching mechanism. How- 
ever, as discussed in Section 3.2, solutions like proxy re- 
encryption might mitigate this issue. 

As a result, we believe that providing data confidential- 
ity while keeping caching mechanism is one of the major 
open challenges in CON. Although we have proposed sev- 
eral countermeasures, most of these rely on public key cryp- 
tography, so real-world performance overhead imposed on 
both clients and servers needs to be thoroughly evaluated. 

5 Related work 

Propelled by the increasing interest for, and investment 
on, future Internet architectures and Content-Oriented Net- 
working (CON), the research community has produced a 
large body of work dealing with CON building blocks [45, 
47, 71, 4, 43], performance [77, 67, 22, 64], and scalabil- 
ity [68, 12]. However, the quest for analyzing and enhanc- 
ing security in CON is only at the beginning. In particular, 
very little work has focused on privacy in CON. In this sec- 
tion, we review relevant prior work. 

Security in CON. Wong and Nikander [75] address secu- 



rity of naming mechanisms by constructing content name as 
the concatenation of content provider's ID, cryptographic 
ID of the content and some meta-data (see Section 3.3). 
Dannewitz et al. [28] adopt a similar approach where each 
content name is defined as the concatenation of the hash of 
the public key and a set of attributes. Both schemes rely on 
cryptographic hash functions to name the content, which re- 
sults in a human-unreadable flat naming. Smetters et al. [70] 
show that these schemes have several drawbacks, such as, 
the need of an indirection mechanism to map and the lack of 
binding between name and producer's identity. To resolve 
these shortcomings, they propose to keep hierarchical hu- 
man readable names while signing both content name and 
the content itself, using producer's public key. Then, Gasti 
et al. [41] studies DoS and DDOS in CCN [45] by present- 
ing attacks and proposing some initial countermeasures. In 
another context. Burke et al. [15] proposed a secure lighting 
systems over Named-Data Networking (NDN), providing 
control access to fixtures via authorization policies, coupled 
with strong authentication. This approach is a first attempt 
to port CON out of the content distribution scenario. 

Privacy Issues in CON. To the best of our knowledge, the 
only related privacy study is the recent article by Lauinger 
et al. in [51] and its extended version [50]. It covers both se- 
curity and privacy issues of CCN [45]. Specifically, it high- 
lights a few Denial-of-Service (DoS) vulnerabilities as well 
as different cache-related attacks. In CCN, a possible DoS 
attack (also discussed in [45]) relies on resource exhaus- 
tion, targeting either routers or content source. Routers are 
forced to perform expensive computation, such as, signature 
verification, which negatively affects the quality of service 
and can ultimately block traffic. Content source can also be 
flooded with a huge number of interests, ending up denying 
service to legitimate users. Additional DoS attacks mainly 
target the cache mechanism, to either decrease network per- 
formance or to gain free and uncontrolled storage. Trans- 
forming the cache into a permanent storage is achieved by 
continuously issuing interests for a desired file. Decreased 
network performance can also be achieved through cache 
pollution. From the privacy perspective, [51] identifies the 
issue of information leakage through caches in CCN [45]. 
A few countermeasures are proposed, following detection 
and prevention approaches. Detection can be achieved us- 
ing techniques similar to those addressing cache pollution 
attacks in IP [29], although such an approach can be difficult 
to port to CON due to the lack of source address. Preven- 
tion can be global, i.e., treating all traffic as sensitive, de- 
laying all traffic, or deploying a shared cache to circumvent 
the attack. Alternatively, a selective prevention approach 
may try to distinguish between sensitive and non-sensitive 
content, based on content popularity and context (time, lo- 
cation), and then delay or tunnel sensitive content. It is 
not clear, however, how to implement the selection mech- 
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anism to distinguish between private and non-private con- 
tent, but authors of [51] suggest to implement this service 
either in the network layer (i.e., the router classifies the con- 
tent) or by the host (i.e., content source tags sensitive con- 
tent). Such classification is in turn a very challenging task, 
since privacy is a relative notion that changes from one user 
to another. Also, censorship and surveillance are briefly 
discussed, although no countermeasures besides tunneling 
have been proposed. Our work extends that in [51, 50] by 
encompassing all privacy aspects: caching, naming, signa- 
ture, and content. Also, it is more general as it does not 
only consider CCN [45], but CON in general, independently 
of the specific instantiation. Furthermore, when suggesting 
countermeasures, we only propose techniques that can be 
applied with a minimal change to the architecture. 

Anonymity in CON. ANDaNA [30] was recently pro- 
posed as a Tor-like, low-latency anonymizing tool for 
CCN [45] to provide provable anonymity. It also aims to 
privacy protection via simple tunneling. However, as dis- 
cussed in Section 4.1, ANDaNA is an "all-in-one" solution 
that introduces latency and impedes caching. Whereas, fine- 
grained privacy solutions are needed, since a widespread 
use of tunneling would inherently take away most of CON 
benefits in terms of mobility, performance, and scalabil- 
ity. To provide censorship resistance, Arianfar et al. [6] 
describes an algorithm to mix legitimate sensitive content 
with so-called "cover files" to hide it. By monitoring the 
content, an adversary would only see the "mixed" content, 
which prevents him from censoring the content. 

6 Conclusion 

Content-Oriented Networking (CON) proposes a major 
transition from today's Internet to a new content-based ar- 
chitecture. This radical change calls for a thorough analysis 
of both security and privacy guarantees. CON comes with a 
potential benefit to security, including a security-by-design 
approach based on digital signature which not only provides 
data integrity and origin authentication but also all the nec- 
essary machinery to support trust. However, it remained a 
compelling open question whether or not, and to what ex- 
tent, this emerging networking paradigm bears new privacy 
challenges. 

In this paper, we presented a first-of-its-kind, systematic 
analysis of privacy issues in CON as a generic paradigm, 
discussing different attacks and detailing their impact on 
user privacy. We also proposed several countermeasures 
while attempting to balance the trade-off between privacy, 
performance, and changes to the architecture. In the pro- 
cess, we identified a number of interesting research chal- 
lenges that call for further work in the area. 

Naturally, our work does not end here. Items for future 
work include: 



• Further evaluating proposed countermeasures and pre- 
senting guidelines for usable and effective deployment. 

• In-depth study of multiple encryption and signature 
techniques and their impact on network performance. 

• Analyzing the impact of privacy-enhancing and CON- 
native technologies on current Internet economy. 
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